
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Markmap</title>
<style>
* {
  margin: 0;
  padding: 0;
}
#mindmap {
  display: block;
  width: 100vw;
  height: 100vh;
}
</style>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/markmap-toolbar@0.18.10/dist/style.css">
</head>
<body>
<svg id="mindmap"></svg>
<script src="https://cdn.jsdelivr.net/npm/d3@7.9.0/dist/d3.min.js"></script><script src="https://cdn.jsdelivr.net/npm/markmap-view@0.18.10/dist/browser/index.js"></script><script src="https://cdn.jsdelivr.net/npm/markmap-toolbar@0.18.10/dist/index.js"></script><script>(()=>{setTimeout(()=>{const{markmap:x,mm:K}=window,P=new x.Toolbar;P.attach(K);const F=P.render();F.setAttribute("style","position:absolute;bottom:20px;right:20px"),document.body.append(F)})})()</script><script>((b,L,T,D)=>{const H=b();window.mm=H.Markmap.create("svg#mindmap",(L||H.deriveOptions)(D),T)})(()=>window.markmap,null,{"content":"\n<h2 data-lines=\"0,1\"><strong><code>ROLL</code></strong></h2>","children":[{"content":"\n<h3 data-lines=\"1,2\"><strong>1. Core Pipelines &amp; Orchestration (<code>roll/pipeline</code>)</strong></h3>","children":[{"content":"<code>base_pipeline.py</code>: Abstract base for all pipelines (e.g., training, evaluation, agent interaction loops).","children":[],"payload":{"tag":"li","lines":"2,3"}},{"content":"<code>base_worker.py</code>: Abstract base for specialized workers within a pipeline (e.g., actor, reward model, environment handler).","children":[],"payload":{"tag":"li","lines":"3,4"}},{"content":"\n<h4 data-lines=\"4,5\"><strong><code>agentic/</code> (Agent-Environment Interaction Pipelines)</strong></h4>","children":[{"content":"<code>agentic_pipeline.py</code>: <strong>Orchestrates</strong> the overall agent learning loop (e.g., ReAct, Reflexion style).","children":[{"content":"<strong>Uses:</strong> <code>agentic_config.py</code> for its specific operational parameters.","children":[],"payload":{"tag":"li","lines":"6,7"}},{"content":"<strong>Manages &amp; Delegates to:</strong> <code>environment_worker.py</code> to handle interactions with specific <code>roll/agentic/env/*</code> instances.","children":[],"payload":{"tag":"li","lines":"7,8"}},{"content":"<strong>Leverages:</strong> <code>roll/agentic/rollout/rollout_scheduler.py</code> to efficiently generate agent trajectories (sequences of observations, actions, rewards).","children":[],"payload":{"tag":"li","lines":"8,9"}},{"content":"<strong>Relies on:</strong> <code>roll/distributed/scheduler/*</code> (e.g., <code>generate_scheduler.py</code>) for distributing environment steps and agent action generation tasks across available resources.","children":[],"payload":{"tag":"li","lines":"9,10"}},{"content":"<strong>Relies on:</strong> <code>roll/distributed/strategy/*</code> for executing the agent&apos;s policy model (inference) and potentially for policy updates.","children":[],"payload":{"tag":"li","lines":"10,11"}},{"content":"<strong>Integrates with:</strong> <code>roll/utils/tracking.py</code> for experiment logging, <code>roll/utils/logging.py</code> for system logs.","children":[],"payload":{"tag":"li","lines":"11,12"}}],"payload":{"tag":"li","lines":"5,12"}},{"content":"<code>agentic_config.py</code>: Dataclass holding configuration specific to agentic pipelines.","children":[],"payload":{"tag":"li","lines":"12,13"}},{"content":"<code>environment_worker.py</code>: A specialized worker that instantiates, manages, and steps through one or more <code>roll/agentic/env/*</code> instances.","children":[{"content":"<strong>Communicates with:</strong> <code>roll/distributed/scheduler/*_scheduler.py</code> to receive tasks and return results.","children":[],"payload":{"tag":"li","lines":"14,15"}}],"payload":{"tag":"li","lines":"13,15"}}],"payload":{"tag":"li","lines":"4,15"}},{"content":"\n<h4 data-lines=\"15,16\"><strong><code>rlvr/</code> (Reinforcement Learning from Verifier Feedback Pipelines, including)</strong></h4>","children":[{"content":"<code>rlvr_pipeline.py</code>: <strong>Orchestrates</strong> the entire RLVR fine-tuning process (e.g., PPO steps).","children":[{"content":"<strong>Uses:</strong> <code>rlvr_config.py</code> for its specific hyperparameters and operational settings.","children":[],"payload":{"tag":"li","lines":"17,18"}},{"content":"<strong>Manages &amp; Delegates to:</strong> <code>actor_worker.py</code> for operations involving the main policy model (the &quot;actor&quot;).","children":[],"payload":{"tag":"li","lines":"18,19"}},{"content":"<strong>Manages &amp; Delegates to:</strong> Various <code>rewards/*_reward_worker.py</code> for operations involving reward/verifier/critic models.","children":[],"payload":{"tag":"li","lines":"19,20"}},{"content":"<strong>Relies heavily on:</strong> <code>roll/distributed/scheduler/*</code> (e.g., <code>generate_scheduler.py</code> for sampling from actor, <code>reward_scheduler.py</code> for scoring, and internal logic for PPO optimization steps) for distributing all major computational stages.","children":[],"payload":{"tag":"li","lines":"20,21"}},{"content":"<strong>Relies heavily on:</strong> <code>roll/distributed/strategy/*</code> for efficient execution of model inference (generation, scoring) and training updates (e.g., using DeepSpeed or Megatron strategies).","children":[],"payload":{"tag":"li","lines":"21,22"}},{"content":"<strong>Utilizes:</strong> <code>roll/utils/kl_controller.py</code> for managing KL divergence in PPO.","children":[],"payload":{"tag":"li","lines":"22,23"}},{"content":"<strong>Integrates with:</strong> <code>roll/utils/tracking.py</code>, <code>roll/utils/logging.py</code>, and <code>roll/utils/checkpoint_manager.py</code> for saving progress.","children":[],"payload":{"tag":"li","lines":"23,24"}}],"payload":{"tag":"li","lines":"16,24"}},{"content":"<code>rlvr_config.py</code>: Dataclass for RLVR-specific configurations.","children":[],"payload":{"tag":"li","lines":"24,25"}},{"content":"<code>actor_worker.py</code>: Manages the language model (actor) that is being fine-tuned.","children":[{"content":"<strong>Interacts with:</strong> A <code>roll/distributed/strategy/*</code> instance for performing generation and receiving training updates.","children":[],"payload":{"tag":"li","lines":"26,27"}}],"payload":{"tag":"li","lines":"25,27"}},{"content":"\n<h4 data-lines=\"27,28\"><code>rewards/</code> (Reward/Verifier Model Workers &amp; Logic)</h4>","children":[{"content":"<strong>Core Function:</strong> These specialized workers are responsible for evaluating the output of the Actor model and generating a scalar reward signal used for PPO/DPO training. Each worker implements a specific verification strategy tailored to different types of tasks and data. They are all scheduled by <code>roll/distributed/scheduler/reward_scheduler.py</code> and execute their logic, which may involve a model call via a <code>roll/distributed/strategy/*</code> instance.","children":[],"payload":{"tag":"li","lines":"28,29"}},{"content":"\n<h4 data-lines=\"29,30\"><strong>LLM-based Verification</strong></h4>","children":[{"content":"<code>llm_judge_reward_worker.py</code>: Implements a &quot;LLM-as-a-Judge&quot; evaluator for tasks where quality is subjective and lacks a ground-truth answer.","children":[{"content":"<strong>Mechanism:</strong> It takes the actor&apos;s generation and formats it into a prompt for a powerful judge model (like GPT-4). This prompt asks the judge to score the generation, compare it against a reference, or pick a winner between two generations.","children":[],"payload":{"tag":"li","lines":"31,32"}},{"content":"<strong>Logic:</strong> The worker&apos;s primary logic is parsing the judge&apos;s textual output to extract a normalized score or preference.","children":[],"payload":{"tag":"li","lines":"32,33"}},{"content":"<strong>Use Case:</strong> Evaluating open-ended generation, creative writing, summarization, and ensuring adherence to complex, nuanced instructions.","children":[],"payload":{"tag":"li","lines":"33,34"}}],"payload":{"tag":"li","lines":"30,34"}}],"payload":{"tag":"li","lines":"29,34"}},{"content":"\n<h4 data-lines=\"34,35\"><strong>Execution-based Verification</strong></h4>","children":[{"content":"<code>code_sandbox_reward_worker.py</code>: Evaluates code generation by executing it in a secure environment.","children":[{"content":"<strong>Mechanism:</strong> It receives a code snippet generated by the actor, writes it to a file, and runs it within a sandboxed environment (e.g., a Docker container) to prevent unsafe operations.","children":[],"payload":{"tag":"li","lines":"36,37"}},{"content":"<strong>Logic:</strong> The reward is calculated based on the execution outcome. This is typically done by running a set of predefined unit tests (<code>pytest</code>) against the generated code. The reward can be binary (all tests pass/fail) or continuous (proportional to the percentage of tests passed).","children":[],"payload":{"tag":"li","lines":"37,38"}},{"content":"<strong>Use Case:</strong> Code generation tasks, code completion, and bug fixing.","children":[],"payload":{"tag":"li","lines":"38,39"}}],"payload":{"tag":"li","lines":"35,39"}}],"payload":{"tag":"li","lines":"34,39"}},{"content":"\n<h4 data-lines=\"39,40\"><strong>Rule-based &amp; Heuristic Verification</strong></h4>","children":[{"content":"<strong>Core Idea:</strong> These workers use deterministic, predefined rules to check for correctness, avoiding the cost and latency of a judge LLM.","children":[],"payload":{"tag":"li","lines":"40,41"}},{"content":"<code>math_rule_reward_worker.py</code>: Designed specifically for mathematical reasoning tasks (e.g., GSM8K).","children":[{"content":"<strong>Logic:</strong> It parses the actor&apos;s full chain-of-thought response to find and extract the final numerical answer, often from a specific format like a LaTeX <code>\\boxed{}</code> expression. It then compares this extracted value against the known ground-truth answer.","children":[],"payload":{"tag":"li","lines":"42,43"}}],"payload":{"tag":"li","lines":"41,43"}},{"content":"<code>ifeval_rule_reward_worker.py</code>: Designed for strict instruction-following tasks.","children":[{"content":"<strong>Logic:</strong> It uses a set of checks (often involving regular expressions) to verify that the actor&apos;s output strictly adheres to all positive and negative constraints laid out in the prompt (e.g., &quot;must contain the word &apos;apple&apos;&quot;, &quot;must not mention the color &apos;red&apos;&quot;).","children":[],"payload":{"tag":"li","lines":"44,45"}}],"payload":{"tag":"li","lines":"43,45"}},{"content":"<code>crossthinkqa_rule_reward_worker.py</code>: Tailored for the CrossThinkQA benchmark, which involves multi-step, complex question answering.","children":[{"content":"<strong>Logic:</strong> Implements the specific evaluation script for this benchmark, which may involve checking for the presence of key entities or logical steps in the final answer.","children":[],"payload":{"tag":"li","lines":"46,47"}}],"payload":{"tag":"li","lines":"45,47"}},{"content":"<code>general_val_rule_reward_worker.py</code>: A general-purpose worker for simple validation tasks.","children":[{"content":"<strong>Logic:</strong> Typically performs straightforward checks like exact string matching between the generated answer and a ground-truth reference or checking for the inclusion of specific keywords. It serves as a baseline rule-based checker.","children":[],"payload":{"tag":"li","lines":"48,49"}}],"payload":{"tag":"li","lines":"47,49"}}],"payload":{"tag":"li","lines":"39,49"}}],"payload":{"tag":"li","lines":"27,49"}}],"payload":{"tag":"li","lines":"15,49"}}],"payload":{"tag":"li","lines":"1,49"}},{"content":"\n<h3 data-lines=\"49,50\"><strong>2. Distributed Execution Framework (<code>roll/distributed</code>)</strong></h3>","children":[{"content":"\n<h4 data-lines=\"50,51\"><strong><code>scheduler/</code> (Ray-based Task Scheduling &amp; Resource Management)</strong></h4>","children":[{"content":"<code>initialize.py</code>: <strong>Crucial entry point</strong> for setting up the Ray distributed environment (connecting to or starting a Ray cluster).","children":[],"payload":{"tag":"li","lines":"51,52"}},{"content":"<code>resource_manager.py</code>: Manages and allocates cluster resources (GPUs, CPUs) to various workers and tasks.","children":[],"payload":{"tag":"li","lines":"52,53"}},{"content":"<code>generate_scheduler.py</code>: Schedules text generation tasks onto appropriate model workers.","children":[{"content":"<strong>Used by:</strong> <code>roll/pipeline/rlvr/rlvr_pipeline.py</code>, <code>roll/pipeline/agentic/agentic_pipeline.py</code>.","children":[],"payload":{"tag":"li","lines":"54,55"}},{"content":"<strong>Interacts with:</strong> <code>roll/distributed/strategy/*</code> instances (running on workers) to dispatch generation requests to the correct model execution backend (e.g., vLLM, SGLang, Megatron).","children":[],"payload":{"tag":"li","lines":"55,56"}}],"payload":{"tag":"li","lines":"53,56"}},{"content":"<code>reward_scheduler.py</code>: Schedules reward computation tasks onto reward model workers.","children":[{"content":"<strong>Used by:</strong> <code>roll/pipeline/rlvr/rlvr_pipeline.py</code>.","children":[],"payload":{"tag":"li","lines":"57,58"}}],"payload":{"tag":"li","lines":"56,58"}},{"content":"<code>decorator.py</code>: Provides Python decorators to easily convert functions into Ray remote tasks or actors, forming the <strong>core abstraction for Ray utilization</strong>.","children":[],"payload":{"tag":"li","lines":"58,59"}},{"content":"<code>protocol.py</code>: Defines data structures (e.g., Pydantic models) for tasks, results, and status updates exchanged between distributed components.","children":[],"payload":{"tag":"li","lines":"59,60"}},{"content":"<code>storage.py</code>, <code>log_monitor.py</code>, <code>driver_utils.py</code>: Supporting utilities for distributed operations, like object storage access, log aggregation, and driver-side helpers.","children":[],"payload":{"tag":"li","lines":"60,61"}}],"payload":{"tag":"li","lines":"50,61"}},{"content":"\n<h4 data-lines=\"61,62\"><strong><code>strategy/</code> (Model Execution Strategies - Bridging Logic to Backends)</strong></h4>","children":[{"content":"<code>strategy.py</code> (Base): Defines the abstract interface that all specific execution strategies must implement (e.g., <code>generate</code>, <code>train_step</code>, <code>get_log_probs</code>).","children":[],"payload":{"tag":"li","lines":"62,63"}},{"content":"<code>factory.py</code>: A factory class responsible for creating instances of specific strategies (e.g., <code>SGLangStrategy</code>, <code>VLLMStrategy</code>, <code>DeepSpeedStrategy</code>) based on configuration.","children":[],"payload":{"tag":"li","lines":"63,64"}},{"content":"<code>sglang_strategy.py</code>, <code>vllm_strategy.py</code>: Strategies for highly optimized LLM inference using SGLang or vLLM.","children":[{"content":"<strong>Directly integrate with and wrap:</strong> <code>roll/third_party/sglang/*</code> or <code>roll/third_party/vllm/*</code> adaptations.","children":[],"payload":{"tag":"li","lines":"65,66"}},{"content":"<strong>Provide:</strong> High-throughput generation capabilities to <code>generate_scheduler.py</code> and other callers.","children":[],"payload":{"tag":"li","lines":"66,67"}}],"payload":{"tag":"li","lines":"64,67"}},{"content":"<code>deepspeed_strategy.py</code>, <code>megatron_strategy.py</code>: Strategies for distributed training (e.g., ZeRO, tensor/pipeline parallelism) and potentially inference using DeepSpeed or Megatron-LM.","children":[{"content":"<strong>Directly integrate with and wrap:</strong> <code>roll/third_party/deepspeed/*</code> or <code>roll/third_party/megatron/*</code> adaptations.","children":[],"payload":{"tag":"li","lines":"68,69"}},{"content":"<strong>Provide:</strong> Methods for executing training steps, performing inference, and calculating log probabilities, primarily used by pipelines like <code>rlvr_pipeline.py</code>.","children":[],"payload":{"tag":"li","lines":"69,70"}},{"content":"<strong>Utilize:</strong> <code>roll/models/model_providers.py</code> to load the underlying HuggingFace or Megatron models.","children":[],"payload":{"tag":"li","lines":"70,71"}},{"content":"<strong>Often use:</strong> <code>roll/utils/deepspeed_utils.py</code> or similar helpers.","children":[],"payload":{"tag":"li","lines":"71,72"}}],"payload":{"tag":"li","lines":"67,72"}},{"content":"<code>hf_strategy.py</code>: A strategy that uses standard HuggingFace Transformers capabilities for model execution.","children":[],"payload":{"tag":"li","lines":"72,73"}}],"payload":{"tag":"li","lines":"61,73"}},{"content":"\n<h4 data-lines=\"73,74\"><strong><code>executor/</code> (Core Distributed Workers &amp; Cluster Abstraction)</strong></h4>","children":[{"content":"<code>worker.py</code>: Represents a generic distributed worker process/actor (typically a Ray actor) that hosts a model and its associated execution <code>strategy/*</code> instance.","children":[{"content":"<strong>Instantiated and managed by:</strong> The <code>*_scheduler.py</code> modules.","children":[],"payload":{"tag":"li","lines":"75,76"}},{"content":"<strong>Executes:</strong> Operations defined by its <code>strategy/*</code> (e.g., a forward pass on a model).","children":[],"payload":{"tag":"li","lines":"76,77"}}],"payload":{"tag":"li","lines":"74,77"}},{"content":"<code>model_update_group.py</code>: Manages parameter synchronization and gradient aggregation across workers in distributed training scenarios.","children":[],"payload":{"tag":"li","lines":"77,78"}},{"content":"<code>cluster.py</code>: Provides an abstraction for the underlying cluster environment, potentially for node discovery or health checks.","children":[],"payload":{"tag":"li","lines":"78,80"}}],"payload":{"tag":"li","lines":"73,80"}}],"payload":{"tag":"li","lines":"49,80"}},{"content":"\n<h3 data-lines=\"80,81\"><strong>3. Agentic Framework (<code>roll/agentic</code>)</strong></h3>","children":[{"content":"\n<h4 data-lines=\"81,82\"><strong><code>env/</code> (Agent Interaction Environments)</strong></h4>","children":[{"content":"<code>base.py</code>: Defines the abstract base class for all agent environments (e.g., defining <code>step</code>, <code>reset</code> methods, observation/action spaces).","children":[],"payload":{"tag":"li","lines":"82,83"}},{"content":"Specific environments (<code>sokoban/</code>, <code>metamathqa/</code>, <code>webshop/</code>, <code>frozen_lake/</code>, etc.): Implement the logic and state for various tasks the agent can interact with.","children":[{"content":"<strong>Instantiated and managed by:</strong> <code>roll/pipeline/agentic/environment_worker.py</code> within an agentic pipeline.","children":[],"payload":{"tag":"li","lines":"84,85"}}],"payload":{"tag":"li","lines":"83,85"}}],"payload":{"tag":"li","lines":"81,85"}},{"content":"\n<h4 data-lines=\"85,86\"><strong><code>rollout/</code> (Trajectory Generation &amp; Agent Interaction Logic)</strong></h4>","children":[{"content":"<code>rollout_scheduler.py</code>: Manages the efficient batching and execution of agent steps across multiple environment instances to collect experience trajectories.","children":[{"content":"<strong>A core component of:</strong> <code>roll/pipeline/agentic/agentic_pipeline.py</code>.","children":[],"payload":{"tag":"li","lines":"87,88"}},{"content":"<strong>Coordinates interaction between:</strong> <code>environment_worker.py</code> (for environment stepping) and the agent&apos;s policy (executed via a <code>roll/distributed/strategy/*</code>) for action selection.","children":[],"payload":{"tag":"li","lines":"88,89"}}],"payload":{"tag":"li","lines":"86,89"}},{"content":"<code>es_manager.py</code>: Suggests capabilities for Evolution Strategies, possibly for parameter-space exploration or non-gradient-based policy improvement.","children":[],"payload":{"tag":"li","lines":"89,90"}}],"payload":{"tag":"li","lines":"85,90"}},{"content":"<code>utils.py</code> (in <code>roll/agentic</code>): Contains utility functions specifically tailored for the agentic framework.","children":[],"payload":{"tag":"li","lines":"90,92"}}],"payload":{"tag":"li","lines":"80,92"}},{"content":"\n<h3 data-lines=\"92,93\"><strong>4. Models &amp; Data Handling (<code>roll/models</code>, <code>roll/datasets</code>)</strong></h3>","children":[{"content":"\n<h4 data-lines=\"93,94\"><strong><code>models/</code></strong></h4>","children":[{"content":"<code>model_providers.py</code>: <strong>A key central module responsible for loading various types of models</strong> (HuggingFace Transformers, Megatron-LM models) from checkpoints or hubs. It makes these models available to the <code>roll/distributed/strategy/*</code> modules for execution.","children":[],"payload":{"tag":"li","lines":"94,95"}},{"content":"<code>trl_patches.py</code>: Patches or extensions for the HuggingFace TRL library, likely utilized by RLHF/RLVR pipelines (e.g., <code>rlvr_pipeline.py</code>) for reference models or PPO components.","children":[],"payload":{"tag":"li","lines":"95,96"}},{"content":"<code>func_providers.py</code>: Could be for exposing specific model functionalities (beyond simple generation/training) as callable services or APIs.","children":[],"payload":{"tag":"li","lines":"96,97"}}],"payload":{"tag":"li","lines":"93,97"}},{"content":"\n<h4 data-lines=\"97,98\"><strong><code>datasets/</code></strong></h4>","children":[{"content":"<code>loader.py</code>: Handles the loading of raw data from various sources (files, databases, etc.).","children":[],"payload":{"tag":"li","lines":"98,99"}},{"content":"<code>sampler.py</code>: Implements different data sampling strategies, crucial for distributed training to ensure each worker gets appropriate data shards.","children":[],"payload":{"tag":"li","lines":"99,100"}},{"content":"<code>collator.py</code>: Takes individual data samples and batches them together, applying necessary padding and formatting to create model-ready input tensors.","children":[],"payload":{"tag":"li","lines":"100,101"}},{"content":"<code>chat_template.py</code>: Applies specific formatting templates to dialogue data to make it suitable for chat-based models.","children":[],"payload":{"tag":"li","lines":"101,102"}},{"content":"(These modules <strong>form the input pipeline that feeds data to</strong> training loops orchestrated by <code>roll/distributed/strategy/*</code> modules, or for evaluation data used by reward models in <code>roll/pipeline/rlvr/rewards/*</code>).","children":[],"payload":{"tag":"li","lines":"102,104"}}],"payload":{"tag":"li","lines":"97,104"}}],"payload":{"tag":"li","lines":"92,104"}},{"content":"\n<h3 data-lines=\"104,105\"><strong>5. Configuration System (<code>roll/configs</code>)</strong></h3>","children":[{"content":"<code>base_config.py</code> (and others like <code>training_args.py</code>, <code>generating_args.py</code>, <code>model_args.py</code>, <code>data_args.py</code>, <code>worker_config.py</code>):","children":[],"payload":{"tag":"li","lines":"105,106"}},{"content":"These files define dataclass-based structures for all configurable aspects of the framework. They are <strong>pervasively loaded and used by virtually all other modules</strong> to control behavior, specify paths, set hyperparameters, etc. Typically, these are parsed from YAML or JSON configuration files at the start of a pipeline (e.g., by scripts like <code>examples/start_rlvr_pipeline.py</code>).","children":[],"payload":{"tag":"li","lines":"106,108"}}],"payload":{"tag":"li","lines":"104,108"}},{"content":"\n<h3 data-lines=\"108,109\"><strong>6. Third-Party Integrations &amp; Colocation Management (<code>roll/third_party</code>)</strong></h3>","children":[{"content":"<strong>Primary Goal:</strong> To enable <strong>model colocation</strong> (running multiple models like a large actor and a reward model on the same GPU resources without memory conflicts). This is achieved by implementing <strong>explicit on-load/off-load mechanisms</strong>, where the <code>roll</code> pipeline can dynamically move model parameters and states between GPU VRAM and CPU RAM/NVMe storage as needed.","children":[],"payload":{"tag":"li","lines":"109,110"}},{"content":"\n<h4 data-lines=\"110,111\"><strong><code>deepspeed/</code> &amp; <code>megatron/</code> (For Distributed Training &amp; Inference)</strong></h4>","children":[{"content":"<strong>Mechanism:</strong> Implements state offloading for training-centric backends. This is crucial in the <code>rlvr</code> pipeline, where the system can off-load the large actor model to free up VRAM, then on-load the reward model for scoring, and vice-versa, enabling a cyclical use of the same GPU hardware.","children":[],"payload":{"tag":"li","lines":"111,112"}},{"content":"<strong>On-Load:</strong> Before a <code>train_step</code> or <code>generate</code> call, the <code>DeepSpeedStrategy</code> or <code>MegatronStrategy</code> uses these functions to move the required model&apos;s parameters onto the GPU.","children":[],"payload":{"tag":"li","lines":"112,113"}},{"content":"<strong>Off-Load:</strong> After the computation, the weights are moved back to CPU RAM, freeing up VRAM.","children":[],"payload":{"tag":"li","lines":"113,114"}}],"payload":{"tag":"li","lines":"110,114"}},{"content":"\n<h4 data-lines=\"114,115\"><strong><code>vllm/</code> &amp; <code>sglang/</code> (For High-Throughput Inference)</strong></h4>","children":[{"content":"<strong>Mechanism:</strong> Implements dynamic model lifecycle management within the inference engine. This allows for efficient colocation where, for instance, a large actor model can be served by the high-performance vLLM/SGLang engine for a generation batch, then be completely torn down from the GPU to make space for a reward model (which might be served by a simpler <code>hf_strategy</code>) to perform its evaluation.","children":[],"payload":{"tag":"li","lines":"115,116"}},{"content":"<strong>On-Load:</strong> The <code>VLLMStrategy</code> or <code>SGLangStrategy</code> will initialize the entire inference engine with the specific model weights. This process allocates VRAM for the model and its KV cache.","children":[],"payload":{"tag":"li","lines":"116,117"}},{"content":"<strong>Off-Load:</strong> The strategy will then completely destroy the engine instance, releasing all associated VRAM (model weights, KV cache, etc.).","children":[],"payload":{"tag":"li","lines":"117,120"}}],"payload":{"tag":"li","lines":"114,120"}}],"payload":{"tag":"li","lines":"108,120"}},{"content":"\n<h3 data-lines=\"120,121\"><strong>7. Core Utilities &amp; Cross-Cutting Concerns (<code>roll/utils</code>)</strong></h3>","children":[{"content":"<code>logging.py</code>: Provides a centralized logging mechanism. <strong>(Used extensively across all modules)</strong>.","children":[],"payload":{"tag":"li","lines":"121,122"}},{"content":"<code>checkpoint_manager.py</code>: Handles the complexities of saving and loading model states, optimizer states, and pipeline progress, especially in distributed settings. <strong>(Critical for <code>roll/pipeline/*</code> and <code>roll/distributed/strategy/*</code>)</strong>.","children":[],"payload":{"tag":"li","lines":"122,123"}},{"content":"<code>tracking.py</code>: Integrates with experiment tracking tools (like Weights &amp; Biases or MLflow) to log metrics, configurations, and artifacts. <strong>(Primarily used by <code>roll/pipeline/*</code>)</strong>.","children":[],"payload":{"tag":"li","lines":"123,124"}},{"content":"<code>kl_controller.py</code>: Implements logic for controlling KL divergence between policy updates, a key component of PPO. <strong>(Specifically used by <code>roll/pipeline/rlvr/rlvr_pipeline.py</code>)</strong>.","children":[],"payload":{"tag":"li","lines":"124,125"}},{"content":"<code>ray_utils.py</code>: Contains helper functions and common patterns for working with the Ray framework. <strong>(Heavily used by modules in <code>roll/distributed/*</code>)</strong>.","children":[],"payload":{"tag":"li","lines":"125,126"}},{"content":"<code>deepspeed_utils.py</code>: Helper functions specific to DeepSpeed integration. <strong>(Used by <code>roll/distributed/strategy/deepspeed_strategy.py</code> and <code>roll/third_party/deepspeed/*</code>)</strong>.","children":[],"payload":{"tag":"li","lines":"126,127"}},{"content":"<code>collective/</code> (e.g., <code>collective.py</code>, <code>pg_utils.py</code>): Wrappers or utilities for distributed collective communication primitives (e.g., all-reduce, broadcast) if not directly handled by the underlying strategy&apos;s library (like DeepSpeed). <strong>(May be used by <code>roll/distributed/strategy/*</code>)</strong>.","children":[],"payload":{"tag":"li","lines":"127,128"}},{"content":"<code>metrics/metrics_manager.py</code>: A system for defining, collecting, aggregating, and reporting various performance metrics. <strong>(Used by <code>roll/pipeline/*</code>, <code>roll/distributed/strategy/*</code>, and potentially <code>roll/agentic/rollout/*</code>)</strong>.","children":[],"payload":{"tag":"li","lines":"128,129"}},{"content":"(Other utilities like <code>send_recv_utils.py</code> for data transfer, <code>offload_states.py</code> for memory optimization, <code>prompt.py</code> for prompt engineering, <code>context_managers.py</code> for resource handling, provide specific, reusable functionalities to various components throughout the codebase).","children":[],"payload":{"tag":"li","lines":"129,130"}}],"payload":{"tag":"li","lines":"120,130"}}],"payload":{"tag":"li","lines":"0,130"}},null)</script>
</body>
</html>
